Access public data using R in Oracle

Access public data using R in Oracle (AFSC only)

Access data via Oracle

If the user has access to the AFSC Oracle database, the user can use SQL developer to view and pull the FOSS public data directly from the RACEBASE_FOSS Oracle schema.

Connect to Oracle from R

Many users will want to access the data from Oracle using R. The user will need to install the RODBC R package and ask OFIS (IT) connect R to Oracle. Then, use the following code in R to establish a connection from R to Oracle:

Here, the user can write in their username and password directly into the RODBC connect function. Never save usernames or passwords in scripts that may be intentionally or unintentionally shared with others. If no username and password is entered in the function, pop-ups will appear on the screen asking for the username and password.

 #' Define RODBC connection to ORACLE
 #'
 #' @param schema default = 'AFSC'. 
 #'
 #' @return oracle channel connection
 #' @export
 #'
 #' @examples
 #' # Not run
 #' # channel <- oracle_connect()
oracle_connect <- function(
    schema='AFSC', 
    username = NULL, 
    passowrd = NULL){(echo=FALSE)
  
  library("RODBC")
  library("getPass")
  if (is.null(username)) {
    username <- getPass(msg = "Enter your ORACLE Username: ")
  }
  if (is.null(password)) {
    password <- getPass(msg = "Enter your ORACLE Password: ")
  }
  channel  <- RODBC::odbcConnect(
    paste(schema),
    paste(username),
    paste(password), 
    believeNRows=FALSE)
  return(channel)
}

channel <- oracle_connect()

Ex. 1: Select all data

Once connected, pull and save (if needed) the table into the R environment.

# Pull table from oracle into R environment
a <- RODBC::sqlQuery(channel, "SELECT * FROM RACEBASE_FOSS.FOSS_CPUE_ZEROFILLED")
# Save table to local directory
write.csv(x = a, file = "RACEBASE_FOSS-FOSS_CPUE_ZEROFILLED.csv")

Ex. 2: Subset data

To pull a small subset of the data (especially since files like RACEBASE_FOSS.FOSS_CPUE_ZEROFILLED are so big), use a variation of the following code. Here, we are pulling EBS Pacific cod from 2010 - 2021:

 # Pull data
a <- RODBC::sqlQuery(channel, "SELECT * FROM RACEBASE_FOSS.FOSS_CPUE_ZEROFILLED 
WHERE SRVY = 'EBS' 
AND COMMON_NAME = 'Pacific cod' 
AND YEAR >= 2010 
AND YEAR < 2021")
 # Save table to local directory
write.csv(x = a, file = "RACEBASE_FOSS-FOSS_CPUE_ZEROFILLED-ebs_pcod_2010-2020.csv")

Join catch and haul data

If this file is too large, you can join catch and haul data. Pull the RACEBASE_FOSS.JOIN_FOSS_CPUE_CATCH and RACEBASE_FOSS.JOIN_FOSS_CPUE_HAUL tables, which are much smaller, and combine the table locally yourself.

To join these tables in Oracle, you may use a variant of the following code:

SELECT * FROM RACEBASE_FOSS.JOIN_FOSS_CPUE_HAUL
FULL JOIN RACEBASE_FOSS.JOIN_FOSS_CPUE_CATCH
ON RACEBASE_FOSS.JOIN_FOSS_CPUE_HAUL.HAULJOIN = RACEBASE_FOSS.JOIN_FOSS_CPUE_CATCH.HAULJOIN;
footer.knit

U.S. Department of Commerce | National Oceanographic and Atmospheric Administration | NOAA Fisheries